Bicomponents and the robustness of networks to failure 
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A common definition of a robust connection between two nodes in a network such as a communi- 
cation network is that there should be at least two independent paths connecting them, so that the 
failure of no single node in the network causes them to become disconnected. This definition leads 
us naturally to consider bicomponents, subnetworks in which every node has a robust connection 
of this kind to every other. Here we study bicomponents in both real and model networks using a 
combination of exact analytic techniques and numerical methods. We show that standard network 
models predict there to be essentially no small bicomponents in most networks, but there may be 
a giant bicomponent, whose presence coincides with the presence of the ordinary giant component, 
and we find that real networks seem by and large to follow this pattern, although there are some 
interesting exceptions. We study the size of the giant bicomponent as nodes in the network fail, 
using a specially developed computer algorithm based on data trees, and find in some cases that 
our networks are quite robust to failure, with large bicomponents persisting until almost all vertices 
have been removed. 



The robustness of connections in networks has been 
studied extensively in the physics community, in part be- 
cause of its practical importance in settings such as com- 
munications networks and epidemiology [l|, 0, H[. The 
typical approach is to consider the largest set of vertices 
in a network that are connected to one another by at 
least one path, the so-called giant component, and exam- 
ine how its size varies as vertices are removed from the 
network. This can be thought of as a simple model for 
the performance of, for instance, a communication net- 
work such as the Internet under failure of vertices. The 
vertices removed can be chosen at random, as if failing 
because of random technical faults, or in a targeted fash- 
ion, as if an adversary were deliberately removing them 
in an effort to destroy network connectivity. 

In real-world situations, however, it is often consid- 
ered inadequate to rely for communication on just a sin- 
gle path between vertices. Most large organizations, for 
example, connect to the Internet using a strategy called 
multihoming, in which they maintain two or more in- 
dependent data connections to the network so that the 
failure of any one connection will not leave them dis- 
connected. More generally, the connection between two 
vertices is considered robust if there are at least two in- 
dependent paths between them, so that the failure of no 
single vertex in the network can cause the two to be dis- 
connected. This leads us to an obvious generalization of 
previous approaches to robustness, in which we focus on 
the set or sets of vertices in a network that are connected 
by at least two paths. Such sets are called bicomponents. 
In this paper we study the robustness of networks, in- 
cluding both real and model networks, in terms of the 
size and behavior of their bicomponents. 

Two paths connecting the same pair of vertices in a 
network are said to be vertex-independent, or simply in- 
dependent, if they share none of the same vertices other 
than the starting and ending vertices. A fc-component is 
a maximal subset of the vertices of a network such that 



every vertex in the subset is connected to every other by 
k independent paths [4(. For the special cases of k = 2, 3 
the /c-components are also called bicomponents and tri- 
components, respectively. Note that not all vertices need 
belong to a fc-component for k > 2, which contrasts with 
the case for ordinary components (k = 1), where every 
vertex belongs to a component. 

The vertices in a bicomponent have the property that 
no two can be disconnected by the failure of any other 
single vertex. Another observation that will become im- 
portant shortly is that the fc-components of a network 
are nested. That is, every bicomponent is, trivially, a 
subset of an ordinary component, every tricomponent is 
a subset of a bicomponent, and so forth. In this paper 
we concentrate primarily on bicomponents, although we 
will give some results for k > 3 where appropriate. 

To gain an understanding of the behavior of network 
bicomponents, let us look first at a standard model net- 
work, the widely studied "configuration model," which is 
a network chosen uniformly at random from the set of 
all networks with a given degree sequence 0, @, 01 ■ The 
probability of an edge falling between two vertices i and j 
in such a network is kikj/2m, where ki is the degree of 
vertex i and m is the total number of edges in the net- 
work. The configuration model is a useful guide to the 
expected qualitative behavior of many network statistics, 
and in particular provides good insight into the behav- 
ior of ordinary components (1-components), having at 
most one giant component of size 0(n) and 0(n) small 
components of size 0(1), a pattern that is seen in most 
real-world networks as well. 

A first interesting result to note is that, by contrast 
with the case for 1-components, there are in general (al- 
most) no small bicomponents in the configuration model. 
To see this consider the small 1-components of the con- 
figuration model, which, as shown elsewhere, are gener- 
ally tree-like, meaning they are not bicomponents. In 
order to turn them into bicomponents, we would need to 
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add at least one edge to the tree, thereby closing a loop 
and creating two paths between some vertices. Since the 
small components have size 0(1) and hence also 0(1) 
pairs of vertices, there are 0(1) opportunities to perform 
such a closure in each small component. Each closure 
occurs with probability kikjjlm = 0(n _1 ) on a sparse 
network and hence the total probability of converting a 
small component into a bicomponent is 0(n _1 ). Since 
there are 0(n) small components, this means that the 
total number of small bicomponcnts formed in this way 
is 0(1). Small bicomponcnts can also be formed out of 
tree-like subsets of the giant component, but a similar 
argument shows that such bicomponents are also 0(1) in 
number. 

Thus in the limit of large network size the probabil- 
ity that a randomly chosen node belongs to a small bi- 
component vanishes as 1/n. Speaking loosely, there are 
no small bicomponcnts in the network. 

There may, however, be a giant bicomponent, and 
moreover it turns out to be possible to calculate the ex- 
pected size of the giant bicomponent exactly in the limit 
of large graph size. In order for a vertex to belong to the 
giant bicomponent at least two of the edges incident on 
that vertex must lead to the giant bicomponent by in- 
dependent paths. However, since the giant bicomponent 
is a subset of the giant 1-component, it follows that any 
edge that leads to the giant 1-component also leads to 
the giant bicomponent, so it will suffice that at least two 
of our vertex's neighbors be in the giant 1-component. 

The probability that the neighbor of a vertex belongs 
to the giant component is straightforward to calculate [7| . 
Let the degree distribution of our network be pk , meaning 
that a randomly chosen vertex has degree k with prob- 
ability pk- If we choose an edge at random and follow 
it to one of the vertices at its ends, then the number of 
edges incident on that vertex, other than the one we ar- 
rived along, follows a different distribution, the so-called 
excess degree distribution: 
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as shown in Q, for example. Here (k) = ^ fe kp^ is the 
mean degree for the entire network. 

Now let u be the probability that upon following an 
edge we reach a vertex that does not belong to the giant 
component. In order for this to be the case, it must be 
that none of the other edges attached to that vertex lead 
to vertices in the giant component, which happens with 
probability u k , where k is the excess degree. Averaging 
over the distribution q^ of fc, we then find that 
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(2) 



where G\{z) = J2k < lkZ k is the probability generating 
function for q^. If the network is to have a giant compo- 
nent this equation must have a nontrivial solution u < 1. 
It is straightforward to show that this occurs if G[ (1) > 1, 



which leads to the well-known criterion of Molloy and 
Reed @ for the existence of the giant component. 

Armed with these results we can now write down the 
probability that a randomly chosen vertex belongs to the 
giant bicomponent. The probability that it does not is 
the probability that either zero or one, but not more, of 
its edges lead to vertices in the giant component, which 
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= G (u) + (l-u)G' (u), 
(3) 

where Gq(z) = ^2 k PkZ k is the probability generating 
function for p k . Then the probability S2 that the vertex 
is in the giant bicomponent is one minus this quantity: 



S 2 = l-G {u)-(l-u)G' (u). 



(4) 



Alternatively, S2 is the expected size of the giant bi- 
component as a fraction of the size of the entire network. 

We have not, in this derivation, demonstrated that the 
two paths from our vertex to the giant bicomponent are 
independent. However, since the diameter of a random 
graph is O(lnn) @, and since the diameter provides an 
upper bound on the lengths of the paths in our derivation, 
the probability of their intersecting is 0(n -1 In 2 n), which 
vanishes as n — ► 00, so our result will be correct provided 
that our network is large. 

There can of course be no giant bicomponent when 
there is no giant 1-component, since the former is a sub- 
set of the latter. Equation (fj| shows that, in general, the 
reverse is also true: when there is a giant 1-component 
there is also a giant bicomponent. Only in the special 
case where 1 — Go(u) = (1 — u)G' a (u) can the giant bi- 
component vanish. 

Thus the giant bicomponent appears, in general, at the 
same time as the giant component: for any specific family 
of degree distributions, if there is a transition marking the 
appearance of the giant component, the same transition 
also marks the appearance of the giant bicomponent. 

Equation (fj| can be generalized straightforwardly to 
fc-components for general fc. The general result is 
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which implies that in fact fc-components for all k appear 
at the same time as the giant 1-component. 

Since the presence of large fc-components is, as we have 
said, a desirable property for many kinds of networks, it 
is instructive to ask how robust the property is to the 
removal of vertices or edges. We already know, from the 
results above, that if we start removing vertices from a 
network, the giant fc-component will vanish at the same 
time as the giant component does. Thus the thresholds 
for the complete destruction of robust connectivity at all 
orders coincide. This however does not tell the whole 
story. Using a simple variant of the arguments above we 
can also calculate the size of the giant fc-component as 
vertices are removed. 
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Let r k be the probability that a vertex of degree k is 
operational (i.e., it hasn't failed or been removed from 
the network). Common choices for r k are r k = constant, 
which corresponds to uniformly random failure of ver- 
tices, or r k = 0(k max — k), where 9(x) is the Heaviside 
step function, which corresponds to removal of all ver- 
tices with degree k > /c max , a form of targeted attack 
against the best-connected vertices 

Then the probability u that an edge does not lead to 
a vertex in the giant component, if that vertex has ex- 
cess degree k (or total degree k + 1), is the probability 
1— r k +i that the vertex has been removed plus the proba- 
bility r k+ \u k that it has not been removed but that none 
of its other edges lead to the giant component either. Av- 
eraging over the distribution q k of the excess degree, we 
then find 
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1-Ji(l) + Fi(ti), 



(6) 



where F\(z) = J^ fc q k r k+ iz k . Then, by an argument sim- 
ilar to the one leading to Eq. (O, the expected size of the 
giant /c-component is 
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where F (z) = J2 k Pk r kZ k - 

The percolation transition at which the giant 1- 
component in a network is destroyed is typically second- 
order in nature. For fc-components with k > 2 on 
the other hand, we can show using the results above 
that the transition is of higher order. Consider for in- 
stance the case of uniform random failure of vertices, for 
which r k = (f> independent of k. Then Eq. ^ becomes 
u = 1 — (f> + 4>G\ (u) and, writing u = 1 — e and making 
use of Gi(l) = 1, we find 



0[eGUl)-|e 2 Gi'(l)] + O(e 3 ) 



or, rearranging, 
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(8) 



(9) 



In other words, 1 — u is linear in 4> — ip c , where the critical 
occupation probability is <f> c = 1/G[(1) 0]. 

Now consider Eq. (J7JI for the size of the giant k- 
component and let us rewrite it as follows. Performing a 
Taylor expansion of F (z) about z = iiwe find 
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Setting z — 1 and making use of |(7J), we then find that 
x ' (1 - u) m d m F 



S k = 



m—k 



rn I 



dz r ' 



(11) 



The leading order term in this expression is 0(1 — u) k 
and hence, by Eq. ©, S k = 0(<f> - <j) c ) k . 



network 


n 


Si 


s 2 


bicomp. 


Internet (AS) 


22 963 


1 


0.651 


0.012 


world wide web 


325 729 


1 


0.414 


0.076 


power grid 


4941 


1 


0.615 


0.062 


C Elegans neural 


297 


1 


0.949 





C. Elegans metabolic 


453 


1 


0.934 


0.049 


physics collaborations 


16 726 


0.829 


0.588 


0.243 


network scientists 


1589 


0.239 


0.084 


0.634 


friendship 


795 


0.979 


0.940 





dating 


573 


0.503 


0.072 


0.014 



TABLE I: Statistics of a number of real-world networks. The 
second to fifth columns give the number of vertices in the net- 
work, the fractions occupied by the largest component and bi- 
component, and the fraction occupied by small components. 
The networks are, in order, a snapshot of the Internet topol- 
ogy at the autonomous system (AS) level, the symmetrized 
web graph of a university web sit e [121 . the Western United 
States power grid [Tl| . the neural [III ] and metabolic [|| net- 
works of the nematode C. Elegans, coauthorship networks of 
physicists [TJJ] and network scientists [TJ|], and friendship and 
dating networks from a study of US school students [l4| . 



Thus the system displays a (potentially) infinite series 
of continuous phase transitions marking the appearance 
of the giant components for different values of k, each oc- 
curring at the same point <p c but each of different order, 
the transition for any given k being of (k + l)th order 
in <p. This means that although, for example, the giant 
bicomponent does appear at the same moment as the or- 
dinary giant component, its appearance is a third-order 
transition and hence it initially grows in size at a far 
slower rate than the giant component so that, in prac- 
tice, the network does not achieve a significant level of 
robustness in the sense considered here until considerably 
above <p c . This behavior is illustrated in the left-hand 
panel of Fig. [H where we show the size of the largest 1- 
and 2-components in random graphs with two different 
degree distributions, along with simulation results for the 
same networks. The second- and third-order natures of 
the phase transitions can be clearly seen. 

Armed with the theoretical insights afforded by the 
configuration model, let us turn now to the behavior of 
bicomponents in real- world networks. All bicomponents 
in a network can be found in time O(n) using the depth- 
first search based algorithm of Hopcroft and Tar j an p~5| . 
Table U summarizes the results of applying this algorithm 
to a variety of previously documented networks. The 
table reveals some interesting features. First we note 
that, with two exceptions, the networks all have large gi- 
ant bicomponents. Certainly the giant bicomponents are 
smaller than the giant components, but in each case the 
networks have a substantial fraction of robust connec- 
tions in the sense considered here. The two exceptions 
are the collaborations of network scientists and the high- 
school dating network. The former has quite a small gi- 
ant component, so the giant bicomponent cannot be very 
large, although it is still quite a small fraction of the giant 



4 




occupation probability (j) 



FIG. 1: Left panel: Size of giant component (k = 1) and giant bicomponent (k — 2) as vertices are randomly removed from 
random graphs with exponential (e _Afe with A = 0.4) and Poisson (mean 1.5) degree distributions. Solid lines are the analytic 
solutions; points are average numerical results for 100 simulations each on networks with 10 6 vertices. Error bars are smaller 
than the points in all cases. Right panel: Size of giant bicomponent as vertices are randomly removed from three real-world 
networks, a metabolic network for C. Elegans a collaboration network of scientists working in condensed matter physics 
and the Western States Power Grid of the United States [ll[. 



component size. The dating network, however, is clearly 
anomalous, having a very substantial giant component 
but a very small giant bicomponent. It is interesting to 
consider whether there might be sociological reasons for 
this anomaly. 

Second, we note that in all but two cases the networks 
have no small bicomponents, or very nearly none. This 
observation agrees well with our calculations for random 
graphs above, but is otherwise somewhat surprising. It 
has been observed that most real-world networks contain 
a high density of short loops [ll[ , a feature that random 
graphs lack and one that might be expected to give rise 
to small bicomponents. Our observations imply, how- 
ever, that in most cases the loops are attached to the 
giant bicomponent, rather than forming independent bi- 
components, and that most portions of the network not in 
the giant bicomponent are tree-like. This agreement with 
the random graph model stands in sharp contrast with 
studies of other network properties such as clustering and 
assortative mixing, in which random graphs match real 
networks poorly. Note also the exception to the over- 
all pattern provided by the two collaboration networks: 
both have quite significant fractions of their vertices in 
small bicomponents, a feature possibly rooted in the par- 
ticular social structure of scientific research. 

It is also interesting to examine the robustness of the 
bicomponent structure to removal of vertices — either ran- 
dom or targeted — as we did for our model networks. The 
Hopcroft-Tarjan algorithm is a poor choice for this calcu- 
lation, since we would have to perform n runs of the algo- 
rithm to find the bicomponents after the removal of each 
vertex, for a total running time 0(n 2 ). For the larger 



networks studied this is prohibitive, so instead we have 
developed a different algorithm that allows the calcula- 
tion to be performed much faster. The algorithm is simi- 
lar in spirit to the fast percolation algorithm of Newman 
and Ziff [lj| and will be discussed in detail elsewhere. 
Here we give only a brief description of its working. 

The algorithm starts with an empty network and adds 
vertices, rather than taking them away, and avoids find- 
ing the bicomponents anew after each addition by calcu- 
lating only the change in the bicomponent structure from 
the previous step, which is usually minor. The algorithm 
stores the structure of the components and bicomponents 
in two separate "forest" data structures (i.e., sets of trees) 
with one tree for each (bi)component. Vertices within 
components contain pointers that point to others in the 
same component, such that by following a sequence of 
such pointers we can reach the root of the tree, thereby 
identifying the component uniquely. As each new ver- 
tex is added to the network we check in this way to 
which components its neighbors belong, amalgamating 
those components if necessary by adding a pointer from 
the root of one tree to the root of the other. If the added 
vertex joins neighbors that already belong to the same 
component, a loop and hence new bicomponent has been 
created, or old bicomponents extended or joined, and the 
bicomponent trees are updated appropriately. The tree 
traversals employed by the algorithm take O(lnn) time 
on average and hence the algorithm can add all n ver- 
tices in a total running time O(nlnn). In practice, this 
gives an improvement in running time of a factor 100 
or more over the Hopcroft-Tarjan algorithm for the net- 
works studied here, and renders the calculations easily 
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doable on a standard desktop computer. 

The right-hand panel of Fig. [1] shows the results of 
the application of this algorithm to three of the networks 
from Table U the metabolic network, the physicist col- 
laborations, and the power grid. For each network there 
appears to be a transition point below which the giant 
bicomponent is destroyed and the network can no longer 
be said to be robustly connected. For two of the net- 
works, however, the transition appears to be at or close 
to 4> = 0) indicating that the networks are highly ro- 
bust in the sense considered here: nearly all the vertices 
have to be removed from the network before the giant 
bicomponent is destroyed. The third network, the power 
grid, shows a much higher transition probability, indicat- 
ing that this network is relatively fragile to random node 
removal. On the other hand, we also see in all cases that 
the transition at which the giant bicomponent appears is 
a gradual one; the gradient at the transition is shallow — 
perhaps even zero — so that the giant bicomponent grows 
very slowly at first above the transition. 

Taken together, the analytic and numerical results pre- 
sented here give an interesting picture of the behavior 



of network bicomponents. Real-world networks appear 
to be quite robust in the sense of having large giant 
bicomponents and moreover the existence of these bi- 
components is in some cases (though not all) itself ro- 
bust to the deletion of vertices. In practice, however, 
although the giant bicomponent may persist as vertices 
are removed from the network, its size dwindles rapidly 
so that large portions of the network lose robust con- 
nection considerably before the transition point at which 
the giant bicomponent finally vanishes. In each of these 
respects the behavior of our networks is surprisingly sim- 
ilar to the behavior of the exactly solvable configuration 
model, which predicts a giant bicomponent that persists 
down to the point at which the ordinary giant component 
disappears, but with an unusual third-order transition at 
that point that ensures that the size of the bicomponent 
will be small as we approach the transition. 

The authors thank Cris Moore for useful conversations. 
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